Assignment 3
1 General information
This assignment is related to Lecture 3 and BDA3 Chapters 2 and 3. Use Frank Harrell’s recommendations on how to state results in Bayesian two group comparisons (and note that there is no point null hypothesis testing in this assignment).
The maximum amount of points from this assignment is 9.
We have prepared two quarto templates specific to this assignment to help you get started:
- A recommended template (html, qmd, pdf) which uses some additional packages, which however requires a bit more set-up work to run and
- a simple template (html, qmd, pdf) which doesn’t use those additional packages and is therefore easier to get to run.
Reading instructions:
Grading instructions:
The grading will be done in peergrade. All grading questions and evaluations for this assignment are contained within this document in the collapsible Rubric blocks.
- The recommended tool in this course is R (with the IDE RStudio).
- Instead of installing R and RStudio on you own computer, see how to use R and RStudio remotely.
- If you want to install R and RStudio locally, download R and RStudio.
- There are tons of tutorials, videos and introductions to R and RStudio online. You can find some initial hints from RStudio Education pages.
- When working with R, we recommend writing the report using
quartoand the provided template. The template includes the formatting instructions and how to include code and figures. - Instead of
quarto, you can use other software to make the PDF report, but the the same instructions for formatting should be used. - Report all results in a single, anonymous *.pdf -file and submit it in peergrade.io.
- The course has its own R package
aaltobdawith data and functionality to simplify coding. The package is pre-installed in JupyterHub. To install the package on your own system, run the following code (upgrade="never" skips question about updating other packages):
install.packages("remotes")
remotes::install_github("avehtari/BDA_course_Aalto", subdir = "rpackage", upgrade="never")- Many of the exercises can be checked automatically using the R package
markmyassignment(pre-installed in JupyterHub). Information on how to install and use the package can be found in themarkmyassignmentdocumentation. There is no need to includemarkmyassignmentresults in the report. - Recommended additional self study exercises for each chapter in BDA3 are listed in the course web page. These will help to gain deeper understanding of the topic.
- Common questions and answers regarding installation and technical problems can be found in Frequently Asked Questions (FAQ).
- Deadlines for all assignments can be found on the course web page and in Peergrade. You can set email alerts for the deadlines in Peergrade settings.
- You are allowed to discuss assignments with your friends, but it is not allowed to copy solutions directly from other students or from internet.
- You can copy, e.g., plotting code from the course demos, but really try to solve the actual assignment problems with your own code and explanations.
- Do not share your answers publicly.
- Do not copy answers from the internet or from previous years. We compare the answers to the answers from previous years and to the answers from other students this year.
- Use of AI is allowed on the course, but the most of the work needs to by the student, and you need to report whether you used AI and in which way you used them (See points 5 and 6 in Aalto guidelines for use of AI in teaching).
- All suspected plagiarism will be reported and investigated. See more about the Aalto University Code of Academic Integrity and Handling Violations Thereof.
- Do not submit empty PDFs, almost empty PDFs, copy of the questions, nonsense generated by yourself or AI, as these are just harming the other students as they can’t do peergrading for the empty or nonsense submissions. Violations of this rule will be reported and investigated in the same way was plagiarism.
- If you have any suggestions or improvements to the course material, please post in the course chat feedback channel, create an issue, or submit a pull request to the public repository!
2 Inference for normal mean and deviation (3 points)
A factory has a production line for manufacturing car windshields. A sample of windshields has been taken for testing hardness. The observed hardness values \(\mathbf{y}_1\) can be found in the dataset windshieldy1 in the aaltobda package.
We may assume that the observations follow a normal distribution with an unknown standard deviation \(\sigma\). We wish to obtain information about the unknown average hardness \(\mu\). For simplicity we assume standard uninformative prior discussed in the book, that is, \(p(\mu, \sigma) \propto \sigma^{-1}\). It is not necessary to derive the posterior distribution in the report, as it has already been done in the book (see section 3.2).
Hint:
Posterior intervals are also called credible intervals and are different from confidence intervals.
Hint:
Predictive intervals are different from posterior intervals.
With a conjugate prior a closed form posterior is Student’s \(t\) form (see equations in the book).
3 Inference for the difference between proportions (3 points)
An experiment was performed to estimate the effect of beta-blockers on mortality of cardiac patients. A group of patients was randomly assigned to treatment and control groups: out of 674 patients receiving the control, 39 died, and out of 680 receiving the treatment, 22 died. Assume that the outcomes are independent and binomially distributed, with probabilities of death of \(p_0\) and \(p_1\) under the control and treatment, respectively. Set up a noninformative or weakly informative prior distribution on \((p_0,p_1)\).
Hint
With a conjugate prior, a closed-form posterior is the Beta form for each group separately (see equations in the book). You can use rbeta() to sample from the posterior distributions of \(p_0\) and \(p_1\), and use this sample and odds ratio equation to get a sample from the distribution of the odds ratio.
4 Inference for the difference between normal means (3 points)
Consider a case where the same factory has two production lines for manufacturing car windshields. Independent samples from the two production lines were tested for hardness. The hardness measurements for the two samples \(\mathbf{y}_1\) and \(\mathbf{y}_2\) be found in the datasets windshieldy1 and windshieldy2 in the aaltobda package.
We assume that the samples have unknown standard deviations \(\sigma_1\) and \(\sigma_2\). Use uninformative or weakly informative priors and answer the following questions:
Hint
With a conjugate prior, a closed-form posterior is Student’s \(t\) form for each group separately (see equations in the book). You can use the rtnew() function to sample from the posterior distributions of \(\mu_1\) and \(\mu_2\), and use this sample to get a sample from the distribution of the difference \(\mu_d = \mu_1 - \mu_2\).